data <-distinct(data) #removes duplicate rowssummary(data) #gives a summary of data, including displaying how many missing values there are
Date Order.ID Product.Name Category
Length:30 Length:30 Length:30 Length:30
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
Price Quantity.Sold Total.Sales Customer.ID
Min. : 39.99 Min. :1.000 Min. : 89.99 Length:30
1st Qu.: 142.49 1st Qu.:1.000 1st Qu.: 199.99 Class :character
Median : 249.99 Median :2.000 Median : 334.99 Mode :character
Mean : 337.32 Mean :1.833 Mean : 527.32
3rd Qu.: 349.99 3rd Qu.:2.000 3rd Qu.: 689.98
Max. : 999.99 Max. :5.000 Max. :2399.97
Customer.Age Customer.Gender Payment.Method Store.Location
Min. :22.00 Length:30 Length:30 Length:30
1st Qu.:29.25 Class :character Class :character Class :character
Median :34.00 Mode :character Mode :character Mode :character
Mean :34.60
3rd Qu.:39.75
Max. :48.00
filtered_data <-filter(data, Store.Location =="New York") #filtered_data only contains sales made in New Yorksorted_filtered <-arrange(filtered_data, desc(Total.Sales)) #sorted_filtered is now sorted in descending order by total sales, so the highest total sale is at index 1print(paste("The highest total sale in New York was recorded on", sorted_filtered$Date[1])) #r indexes from 1 rather than 0
[1] "The highest total sale in New York was recorded on 2023-01-27"
freq_table <-table(data$Payment.Method) # will have amount of times each payment method is used correspond with the payment methodmost_used <-names(which.max(freq_table)) #most used payment method is stored in the most_used variableprint(paste("The most used payment method is", most_used)) #prints most used payment method
[1] "The most used payment method is Credit Card"
hist(data$Customer.Age, #creates a histogram of customer agemain ="Customer Age Histogram", #sets histogram title to customer age histogramxlab ="Customer Age"#sets x-axis label to customer age )
plot(data$Quantity.Sold, data$Price, #creates a scatterplot of quantity vs pricexlab ="Quantity Sold", #labels the x-axisylab ="Price", #labels the y-axispch =16, #I don't like the default dotsmain ="Relationship Between Quantity and Price") #sets a title for the plot
Figure 1: Relationship Between Quantity and Price
We can see in Figure Figure 1 that items that sell better tend to cost less. As quantity sold increases, price decreases.